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This listing of claims will replace all prior versions, 



and listings, of claims in the application: 



LISTING OF CLAIMS : 
Claims 1-21 (cancelled) 



22. (new) A method for gene mapping to locate a gene associated with certain phenotype 
from dataset of genotype and phenotype data, by analyzing associations between genetic markers 
m h which are polymorphic nucleic acid or protein sequences or strings of single-nucleotide 
polymorphisms deriving from a chromosomal region, comprising 

i) searching from said dataset for all marker patterns P that satisfy a pattern evaluation 
function e(P) 9 wherein 

a. the marker patterns are expressions within said dataset comprising the marker- 
allele assignments and zero or more of the following: individual covariates, 
environmental variables and auxiliary phenotypes; and 

b. the pattern evaluation function e(P) is true if and only if there is a strong 
association between the marker pattern P and the phenotype being studied, 

by testing each marker of pattern P against the corresponding allele pair in genotype G, 
effectively finding out if there is a possible haplotype configuration of G which matches P 
and counting the possible matches as matches, 

ii) scoring each marker ntj of said dataset with a marker score s(mj) 9 which is a function of 
the set St defined as the set of marker patterns overlapping the marker m( and satisfying the 
pattern evaluation function e as defined in step i), and 

iii) locating said the gene to the marker m; having the best score s(mj) wherein the best score 

is the highest obtained score if said scoring function is designed to give higher scores closer 
to the gene, or the lowest obtained score if said scoring function is designed to give lower 
scores closer to the gene. 
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23. (new) The method of claim 22, wherein a marker is scored as the sum of the weights of 
overlapping patterns. 

24. (new) The method of claim 23, wherein the weight of a pattern is a function of 



markers within the pattern in genotype i, summed over all matched genotypes, or 

the informativeness of the pattern, e.g. 2 H , where H is the average heterozygosity within 
the pattern, or 

the strength of association, e.g. chi-squared. 

25. (new) The method of claim 22, wherein the marker patterns P are searched by the 
following algorithm: 

Input 

• set U of possible marker patterns 

• evaluation function e(P) for patterns P in U 

• (generalization) relation < for patterns in U 

• where the function e and the relation < are such that if e(P) is true and P'<P 9 then eiP*) 
is also true 

Output 

• set S = {P e U | e(P) is true) of patterns 



the uncertainty of matching, e.g. 2 



, where N[i] is the number of heterozygous 



Method 



2. 



4. 



3. 



5. 



1. 



S: = {} 

II Initialize the set of evaluated patterns: 
E:= {} 

// Start with the most general patterns: 

Gen := {P in U | there is no P 9 in U f P[ □ P, such that P' < P} 
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6. // Recursively evaluate patterns in a depth first order: 

7. foreach P e Gen { evaluatePatterns(P) } 

8. end; 

9. procedure evaluatePatterns(P) { 

10. insert P into the set E 

11. if e(P) = true then { 

12. insert P into set S 

13. // Find all specializations of P that have not been tested yet, and 

14. // evaluate them recursively: 

1 5. Spec : = {P f in U-E \ P < P\ P[ □ P, and there is no P" in U-E, P'[ □ P 

16. andP". □ P\ with P < P" < P'}; 

17. foreach P' in 5/?ec { evaluatePatternsCP'); } 

18. } 

19. } 

26. (new) The method of claim 22, wherein the marker patterns P are searched by the 
following algorithm: 

Input 

• set U of possible marker patterns 

• evaluation function e{P) for patterns P 'mU 

• frequency threshold x 

Output 

• set S = {P in U \ e(P) and ae(P) is true} of patterns, where ae(P) is true if and only if the 
frequency of pattern P exceeds a given threshold x 

Method 

20. S : = {} 

21 . // Initialize the set of evaluated patterns: 

22. £:={} 

23. // Start with the most general patterns: 

24. Gen : = {P in U \ there is no P' in U, P' \= P, such that P -> P' } 

25. // Recursively evaluate patterns in a depth-first order: 

26. foreach P in Gen { evaluatePatterns(F) } 
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27. end 

28. procedure evaluatePatterns(P) { 

29. insert P into the set E 

30. if ae(P) = true then { 

31. if e(P) = true then insert P into set S 

32. // Find all specializations of P that have not been tested yet, and evaluate 

33. // them recursively: 

34. Spec : = {P' in U-E \ P' -> P, P[ □ P, and there is no P" in U-E, P'[ □ P 

35. and P'[ □ P\ with P' -> P" and P" -> P } 

36. foreach P f in Spec { evaluatePatterns(P') } 

37. } 

38. } 

27. (new) The method of claim 22, wherein the marker patterns P are searched by the 
following algorithm: 

Input 

• marker map M = (mj, ... ,m^) 

• phenotype vector Y = (Yj, Y n ) 

• genotype matrix H of size n * k * 2 (n persons, k markers, 2 alleles per person and 
marker) 

• association threshold x for chi-squared test 

• maximum pattern length / 

• maximum number of gaps g 

• maximum gap size s 

Output 

• set S — {P in U \ e(P) is true} of patterns, 

• where U consists of patterns on M that consist of marker-allele assignments and that 
adhere to parameters /, g, and i, and 

• where e(P) is true if and only if chi-squared test on P using genotype matrix H and 
phenotypes Y exceeds the given threshold* 

Method 
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39. S: = {} 

40. // Number of case and control persons: 

41 . pi a := number of affected persons; 

42. piQ := number of control persons; 

43. pi := pi a + piQ 

44. // A lower bound for pattern frequency: 

45. lb := piyi *pi * x /{pic * + P*A * x ) 

46. // Variable for iterating over different patterns: 

47. P = (pi Pk) '=('*'> '**) 

48. for i := 1 to k { 

49. // alleles(m z ) is the set of alleles of the i:th marker 

50. foreach a in alleles(w/) { 

51. pi := a 

52. // Test pattern P and all its extensions: 

53. checkPatterns(P, i, i, 0, 0) 

54. // Reset pf 

55. /?//='*' 

56. } 

57. } 

58. end 

59. // Test haplotype pattern P and all patterns that can be generated by extending P 

60. // from the right: 

61. procedure checkPatterns(P, start, i, nr of_gaps, gap length) { 

62. // Output strongly associated patterns 

63. if chi-squared(P, M t H y Y) >= x and p( != '*' then insert P into set S 

64. // Return if extended patterns would be too long: 

65. if i = k or i+l-start > I then return 

66. // Return if extended patterns can not be strongly disease-associated: 

67. if frequency of P in affected persons is less than lb 

68. then return; 

69. // Create and test legal extensions of current pattern P (3 cases): 

70. // 1. Give marker i+1 all possible values: 
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71. foreach a in alleles(/w;+/) { 

72 - Pi+1 • = « 

73. checkPattems (P, start, z+7, nr_of_gaps, 0) 

74. } 

75. // 2. Introduce a new gap starting at marker j+7: 

76. if /?/ ^ '*' and nrjofjgaps < g and s > 7 then { 

77. mi := '*' 

78. checkPattems (P, start, z+7, nr_of_gaps+l, 7) 

79. } 

80. // 3. Extend the current gap over marker /+/: 

81. if pi = and gap length < s then { 

82. PM := '*' 

83. checkPattems (P, star/, z+7, nr_of_gaps, gap_length+l) 

84. } 

85. // Before returning, reset /?/+ 7 : 

86. p i+] := '*' 

87. return 

88. } 

28. (new) The method of claim 22, wherein the marker patterns P are searched by the 
following algorithm: 

Input 

• set U of possible marker patterns 

• evaluation function e(P) for patterns P in U 

• (generalization) relation < for patterns in U, where the function e and the relation < are 
such that if e(P) is true and P' < P, then e{P r ) is also true 

Output 

• set S = {P in U \ e(P) is true} of patterns 
Definitions 
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• function Lgg: U->2 U , Lgg(P) = { P ' mU \P> P* and P ' != P and there is no P" in U 
such that P\=P" ! = P ' and P > P ' ' > P '} , the set of least general generalizations of pattern 
P. 

• function Lss: U->2 U 9 Lss(P) = { P' in U \ P < P' and P' != P and there is no P" in U 
such that P ! = P" ! = P' and P<P"<P'} 9 the set of least special specializations of pattern 
P. 

Method 

89. S : = {} 

90. Q : = {} 

91 . // Start with the most general patterns: 

92. F : = {P in U \ there is no P' in U, P' != P, 5wcA rta/ P' < P}; 

93. while F!= {} { 

94. // Evaluate the candidate patterns: 

95. foreachPinF { 

96. if e(P) = true then insert P into set S 

97. else remove P from set P 

98. } 

99. g: = 0imionF 

100. // Generate a new set of candidate patterns: 

101. C: = {} 

102. foreachPinP { 

103. C : = C union { P ' in t/ 1 P ' in Z,ss(P) a/zrf/or a// P ' ' in Z,^g(P ') : 

104. P"inQ} 

105. } 

106. F: = C 

107. } 

108. end 

29. (new) The method of claim 22, wherein the marker patterns P are searched by the 
following algorithm: 

Input 

• set U of possible marker patterns 
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• evaluation function e(P) for patterns P in U 

• frequency threshold x 

Output 

• set S = {P in U \ e(P) and ae(P) is true} of patterns, where ae(P) is true if and only if the 
frequency of pattern P exceeds a given threshold x 

Definitions 

• function Lgg: U->2 U , Lgg(P) ={ P' in U | P -> P' and P' \= P and there is no P" in U 
such that P\=P"\=P' and P->P"->P'}, the set of least general generalizations of pattern 
P. 

• function^: U-> 2°, Lss(P) = {P'in U \P' ->PandP' != P and there is no P" in U 
such that P\=P" \=P' and P ' -> P " -> P} , the set of least special specializations of pattern 
P. 

Method 

109. £: = {} 

110. Q: = {} 

111. // Start with the most general patterns: 

1 12. F := {P in U \ there is no P' in U, P' != P, such that P -> P 1 }; 

113. while F\= {} { 

114. // Evaluate the candidate patterns: 

115. foreach P in F { 

116. if ae(P) = true then { 

117. if e(P) = true then insert P into set S 

118. } 

119. else remove P from set F 

120. } 

121. Q: = Q\xmonF 

122. // Generate a new set of candidate patterns: 

123. C: = {} 

124. foreach P in F { 

125. C : = C union { P ' in t/ 1 P ' in Z,ss(P) a^rf/or a// P ' 1 in '): 

126. P"inQ} 
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129. } 

130. end 

30. (new) The method of claim 22, wherein 

a) the phenotype being studied is qualitative, and 

b) the pattern evaluation function e(P) has the form e(P) = true if and only ife'(P) > x, 
where e'(P) is the (signed) association measure % an< 3 * is a user-specified minimum 
value, which is chosen so that the sizes of S,- are large enough, such as 7, to give 
statistically sufficiently reliable estimates for the gene locus, and 

c) the score s(mj) of marker m,- is the size of S,-, also called marker-wise pattern 
frequency of m; and denoted by f(mj). 

3 1 . (new) The method of claim 22, wherein 

a) the pattern evaluation function e(P) has the form e(P) = true if and only ife '(P) > 
jc, where e '(P) is the absolute frequency of pattern P in the data and x is a user-specified 
value, which is chosen so that the sizes of Si are large enough, such as 20, to give 
statistically sufficiently reliable estimates for the gene locus, and, 

b) in order to derive the score s(mj), the p value (statistical significance) of each 
marker pattern P in determining the phenotype being studied is evaluated, and 

c) the score s (nij) is the distance between the observed p value distribution of patterns 

in Si and the uniform distribution, defined as average of (pi - qi) log (p,- / qi) over all i — 
1 where n is the number of haplotype patterns in S/,p,- is the ith smallest p value in S/, 

and qi is the expectation of the /th smallest p value, if the p values were randomly drawn 
from the uniform distribution. 

32. (new) The method of claim 31, wherein the p value is computed using a linear model of 
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form Y = fi\X\ + . . . + p k Xk + ccZ + fib, where the dependent variable Y is the phenotype being 
studied, X\ through Xk are covariates, such as environmental factors, and Z is a dummy variable for 
the occurrence of the haplotype pattern, and 
the coefficients a and J3* are adjusted for best fit, and then 

the significance of Z as a covariate is assessed by using a t test with the null hypothesis "a = 0". 

33. (new) The method of claim 22, wherein each score s(mj) is refined by replacing it by the 
marker-wise p value of the score s(mj), where the statistical significance of s(mj) is measured 
against the null hypotheses that there is no gene effect. 

34. (new) The method of claim 33, wherein the marker-wise p values p(m\) are determined by 
randomly permuting phenotypes. 

35. (new) The method of claim 22, wherein the area returned from the prediction of the gene 
location is contiguous or fragmented or a point. 

36. (new) The method of claim 22, wherein the location of the gene, predicted as a function 
of the scores s(mj) and based on maximizing or minimizing the score, is predicted to the location 

of the marker m( that maximizes or minimizes the marker score s(mj). 

37. (new) The method of claim 22, wherein the location of the gene, predicted as a function of 
the scores s(ntj) and based on maximizing or minimizing the score, is predicted to the combination 

of most probable intervals for containing the trait-susceptibility locus that covers at most the desired 
proportion ranging from 0 to 100 % of the region covered by markers m, obtained by taking all such 
points in said region whose nearest marker is within the k best scoring markers, where k is selected 
such that the resulting area has length at most t times the length of said region, and where k is 
maximal such value. 

38. (new) The method of claim 22, wherein the location of the gene, predicted as a function of 
the scores s(mj) and based on maximizing or minimizing the score, is predicted to those points in the 

studied chromosomal region whose nearest marker scores at least y or at most >>, where y is scoring 
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function dependent and is selected so that the probability of the gene being close to the marker is 
sufficiently large. 

39. (new) The method of claim 22, wherein multiple genes are searched by using marker 
patterns that refers to different potential gene loci at the same time. 

40 . (new) A computer-readable data storage medium having computer-executable program 
code stored operative to perform a method of claim 22 when executed on a computer. 



41. 



(new) A computer system, which is programmed to perform the method of claim 39. 
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